{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0af3ec93",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/vector_stores/ChromaIndexDemo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "307804a3-c02b-4a57-ac0d-172c30ddc851",
   "metadata": {},
   "source": [
    "# Chroma\n",
    "\n",
    ">[Chroma](https://docs.trychroma.com/getting-started) is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0.\n",
    "\n",
    "<a href=\"https://discord.gg/MMeYNTmh3x\" target=\"_blank\">\n",
    "      <img src=\"https://img.shields.io/discord/1073293645303795742\" alt=\"Discord\">\n",
    "  </a>&nbsp;&nbsp;\n",
    "  <a href=\"https://github.com/chroma-core/chroma/blob/master/LICENSE\" target=\"_blank\">\n",
    "      <img src=\"https://img.shields.io/static/v1?label=license&message=Apache 2.0&color=white\" alt=\"License\">\n",
    "  </a>&nbsp;&nbsp;\n",
    "  <img src=\"https://github.com/chroma-core/chroma/actions/workflows/chroma-integration-test.yml/badge.svg?branch=main\" alt=\"Integration Tests\">\n",
    "\n",
    "- [Website](https://www.trychroma.com/)\n",
    "- [Documentation](https://docs.trychroma.com/)\n",
    "- [Twitter](https://twitter.com/trychroma)\n",
    "- [Discord](https://discord.gg/MMeYNTmh3x)\n",
    "\n",
    "Chroma is fully-typed, fully-tested and fully-documented.\n",
    "\n",
    "Install Chroma with:\n",
    "\n",
    "```sh\n",
    "pip install chromadb\n",
    "```\n",
    "\n",
    "Chroma runs in various modes. See below for examples of each integrated with LangChain.\n",
    "- `in-memory` - in a python script or jupyter notebook\n",
    "- `in-memory with persistance` - in a script or notebook and save/load to disk\n",
    "- `in a docker container` - as a server running your local machine or in the cloud\n",
    "\n",
    "Like any other database, you can: \n",
    "- `.add` \n",
    "- `.get` \n",
    "- `.update`\n",
    "- `.upsert`\n",
    "- `.delete`\n",
    "- `.peek`\n",
    "- and `.query` runs the similarity search.\n",
    "\n",
    "View full docs at [docs](https://docs.trychroma.com/reference/Collection). "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b5331b6b",
   "metadata": {},
   "source": [
    "## Basic Example\n",
    "\n",
    "In this basic example, we take the Paul Graham essay, split it into chunks, embed it using an open-source embedding model, load it into Chroma, and then query it."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "54361467",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0ffe7d98",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f7010b1d-d1bb-4f08-9309-a328bb4ea396",
   "metadata": {},
   "source": [
    "#### Creating a Chroma Index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b3df0b97",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install llama-index chromadb --quiet\n",
    "# !pip install chromadb\n",
    "# !pip install sentence-transformers\n",
    "# !pip install pydantic==1.10.11"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d48af8e1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# import\n",
    "from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext\n",
    "from llama_index.vector_stores import ChromaVectorStore\n",
    "from llama_index.storage.storage_context import StorageContext\n",
    "from llama_index.embeddings import HuggingFaceEmbedding\n",
    "from IPython.display import Markdown, display\n",
    "import chromadb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "374a148b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# set up OpenAI\n",
    "import os\n",
    "import getpass\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")\n",
    "import openai\n",
    "\n",
    "openai.api_key = os.environ[\"OPENAI_API_KEY\"]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7b9a55de",
   "metadata": {},
   "source": [
    "Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "01f19bc6",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "667f3cb3-ce18-48d5-b9aa-bfc1a1f0f0f6",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n",
      "/Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.\n",
      "  warn(\"The installed version of bitsandbytes was compiled without GPU support. \"\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "'NoneType' object has no attribute 'cadam32bit_grad_fp32'\n"
     ]
    },
    {
     "data": {
      "text/markdown": [
       "<b>The author worked on writing and programming growing up. They wrote short stories and tried writing programs on an IBM 1401 computer. Later, they got a microcomputer and started programming more extensively.</b>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# create client and a new collection\n",
    "chroma_client = chromadb.EphemeralClient()\n",
    "chroma_collection = chroma_client.create_collection(\"quickstart\")\n",
    "\n",
    "# define embedding function\n",
    "embed_model = HuggingFaceEmbedding(model_name=\"BAAI/bge-base-en-v1.5\")\n",
    "\n",
    "# load documents\n",
    "documents = SimpleDirectoryReader(\"./data/paul_graham/\").load_data()\n",
    "\n",
    "# set up ChromaVectorStore and load in data\n",
    "vector_store = ChromaVectorStore(chroma_collection=chroma_collection)\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "service_context = ServiceContext.from_defaults(embed_model=embed_model)\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, storage_context=storage_context, service_context=service_context\n",
    ")\n",
    "\n",
    "# Query Data\n",
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"What did the author do growing up?\")\n",
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "349de571",
   "metadata": {},
   "source": [
    "## Basic Example (including saving to disk)\n",
    "\n",
    "Extending the previous example, if you want to save to disk, simply initialize the Chroma client and pass the directory where you want the data to be saved to. \n",
    "\n",
    "`Caution`: Chroma makes a best-effort to automatically save data to disk, however multiple in-memory clients can stomp each other's work. As a best practice, only have one client per path running at any given time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9c3a56a5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<b>The author worked on writing and programming growing up. They wrote short stories and tried writing programs on an IBM 1401 computer. Later, they got a microcomputer and started programming games and a word processor.</b>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# save to disk\n",
    "\n",
    "db = chromadb.PersistentClient(path=\"./chroma_db\")\n",
    "chroma_collection = db.get_or_create_collection(\"quickstart\")\n",
    "vector_store = ChromaVectorStore(chroma_collection=chroma_collection)\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "service_context = ServiceContext.from_defaults(embed_model=embed_model)\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, storage_context=storage_context, service_context=service_context\n",
    ")\n",
    "\n",
    "# load from disk\n",
    "db2 = chromadb.PersistentClient(path=\"./chroma_db\")\n",
    "chroma_collection = db2.get_or_create_collection(\"quickstart\")\n",
    "vector_store = ChromaVectorStore(chroma_collection=chroma_collection)\n",
    "index = VectorStoreIndex.from_vector_store(\n",
    "    vector_store,\n",
    "    service_context=service_context,\n",
    ")\n",
    "\n",
    "# Query Data from the persisted index\n",
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"What did the author do growing up?\")\n",
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d596e475",
   "metadata": {},
   "source": [
    "## Basic Example (using the Docker Container)\n",
    "\n",
    "You can also run the Chroma Server in a Docker container separately, create a Client to connect to it, and then pass that to LlamaIndex. \n",
    "\n",
    "Here is how to clone, build, and run the Docker Image:\n",
    "```\n",
    "git clone git@github.com:chroma-core/chroma.git\n",
    "docker-compose up -d --build\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d6c9bd64",
   "metadata": {},
   "outputs": [],
   "source": [
    "# create the chroma client and add our data\n",
    "import chromadb\n",
    "\n",
    "remote_db = chromadb.HttpClient()\n",
    "chroma_collection = remote_db.get_or_create_collection(\"quickstart\")\n",
    "vector_store = ChromaVectorStore(chroma_collection=chroma_collection)\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "service_context = ServiceContext.from_defaults(embed_model=embed_model)\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, storage_context=storage_context, service_context=service_context\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "88e10c26",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<b>\n",
       "Growing up, the author wrote short stories, programmed on an IBM 1401, and wrote programs on a TRS-80 microcomputer. He also took painting classes at Harvard and worked as a de facto studio assistant for a painter. He also tried to start a company to put art galleries online, and wrote software to build online stores.</b>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Query Data from the Chroma Docker index\n",
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"What did the author do growing up?\")\n",
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0a0e79f7",
   "metadata": {},
   "source": [
    "## Update and Delete\n",
    "\n",
    "While building toward a real application, you want to go beyond adding data, and also update and delete data. \n",
    "\n",
    "Chroma has users provide `ids` to simplify the bookkeeping here. `ids` can be the name of the file, or a combined has like `filename_paragraphNumber`, etc.\n",
    "\n",
    "Here is a basic example showing how to do various operations:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d9411826",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'_node_content': '{\"id_\": \"be08c8bc-f43e-4a71-ba64-e525921a8319\", \"embedding\": null, \"metadata\": {}, \"excluded_embed_metadata_keys\": [], \"excluded_llm_metadata_keys\": [], \"relationships\": {\"1\": {\"node_id\": \"2cbecdbb-0840-48b2-8151-00119da0995b\", \"node_type\": null, \"metadata\": {}, \"hash\": \"4c702b4df575421e1d1af4b1fd50511b226e0c9863dbfffeccb8b689b8448f35\"}, \"3\": {\"node_id\": \"6a75604a-fa76-4193-8f52-c72a7b18b154\", \"node_type\": null, \"metadata\": {}, \"hash\": \"d6c408ee1fbca650fb669214e6f32ffe363b658201d31c204e85a72edb71772f\"}}, \"hash\": \"b4d0b960aa09e693f9dc0d50ef46a3d0bf5a8fb3ac9f3e4bcf438e326d17e0d8\", \"text\": \"\", \"start_char_idx\": 0, \"end_char_idx\": 4050, \"text_template\": \"{metadata_str}\\\\n\\\\n{content}\", \"metadata_template\": \"{key}: {value}\", \"metadata_seperator\": \"\\\\n\"}', 'author': 'Paul Graham', 'doc_id': '2cbecdbb-0840-48b2-8151-00119da0995b', 'document_id': '2cbecdbb-0840-48b2-8151-00119da0995b', 'ref_doc_id': '2cbecdbb-0840-48b2-8151-00119da0995b'}\n",
      "count before 20\n",
      "count after 19\n"
     ]
    }
   ],
   "source": [
    "doc_to_update = chroma_collection.get(limit=1)\n",
    "doc_to_update[\"metadatas\"][0] = {\n",
    "    **doc_to_update[\"metadatas\"][0],\n",
    "    **{\"author\": \"Paul Graham\"},\n",
    "}\n",
    "chroma_collection.update(\n",
    "    ids=[doc_to_update[\"ids\"][0]], metadatas=[doc_to_update[\"metadatas\"][0]]\n",
    ")\n",
    "updated_doc = chroma_collection.get(limit=1)\n",
    "print(updated_doc[\"metadatas\"][0])\n",
    "\n",
    "# delete the last document\n",
    "print(\"count before\", chroma_collection.count())\n",
    "chroma_collection.delete(ids=[doc_to_update[\"ids\"][0]])\n",
    "print(\"count after\", chroma_collection.count())"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama-index",
   "language": "python",
   "name": "llama-index"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  },
  "vscode": {
   "interpreter": {
    "hash": "0ac390d292208ca2380c85f5bce7ded36a7a25670a97c40b8009630eb36cb06e"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
